Tag

#AI benchmarks

8 articles

GPT-5.6 Sol nearly matches Fable 5 on aggregated benchmarks at one-third the cost

OpenAI's GPT-5.6 Sol nearly matches Claude Fable 5 on benchmarks while costing one-third as much, signaling a major shift in AI pricing and performance dynamics.

Jul 920

Anthropic's Claude Fable 5 dominates new industry benchmarks at a steep premium

Anthropic's Claude Fable 5 leads industry benchmarks in finance, law, and medicine, but its $3.48 per task price tag is over 100 times more than competitors like DeepSeek V4 Pro.

Jul 824

Inside Genebench-Pro

OpenAI introduces Genebench-Pro, a new benchmark for evaluating large language models' biological and medical understanding capabilities. The tool aims to advance AI applications in healthcare and scientific research.

Jun 3044

Fable 5 was beating GPT 5.5 on every major benchmark. Then the US government pulled it offline.

Anthropic's Fable 5 briefly outperformed OpenAI's GPT-5.5 before being shut down by the U.S. government, sparking speculation about national security concerns and AI regulation.

Jun 1461

GPT-5.5 tops benchmarks but still hallucinates frequently at a 20 percent higher API cost

GPT-5.5 tops AI benchmarks but still hallucinates frequently, and its API cost has risen by 20%.

Apr 2672

Alibaba's open model Qwen3.6 leads Google's Gemma 4 across agentic coding benchmarks

Alibaba's Qwen3.6 outperforms Google's Gemma 4 in agentic coding benchmarks, showcasing the power of efficient AI architectures.

Apr 1769

AI benchmarks systematically ignore how humans disagree, Google study finds

This article explains how human disagreement in AI benchmarking can lead to unreliable performance metrics and why current practices need to evolve to account for annotation variability.

Apr 4121

Cohere releases open source model that tops speech recognition benchmarks

Cohere has released an open-source speech recognition model that outperforms industry leader OpenAI's Whisper in benchmark tests.

Mar 27111